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Program review in the physical sciences may benefit from a framework within which to quan- 
titatively discuss the scientific merit of a proposed theoretical program of research, and to assess 
the scientific merit of a particular theoretical paper. This article interprets a previously proposed 
measure of experimental scientific merit in a manner appropriate for quantifying the scientific merit 
of completed and proposed theoretical research. With this interpretation, the resulting figure of 
merit represents a proposal for a quantitative measure of total scientific merit. 
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I. MOTIVATION 

A quantitative measure of experimental scientific merit 
was proposed in Ref. [1]. This article shows that the 
same measure, appropriately interpreted, can be used as 
a quantitative measure of theoretical scientific merit. The 
measure, encompassing both experimental and theoreti- 
cal scientific merit, is thus a proposed measure of total 
scientific merit. 

In the context of determining which research programs 
to pursue, review committees often must decide the rela- 
tive scientific merits of proposed research directions, both 
experimental and theoretical. Similar decisions are made 
at all levels, charting directions for entire fields, for large 
collaborations, for individual university groups, and for 
individual scientists, and over timescales ranging from 
a decadal plan for a field to how an individual scien- 
tist chooses to allocate her next hour of research time. 
These issues arise in the discussion of research directions 
in which the result is not yet known. 

A related issue is faced by those assessing the scientific 
merit of a particular theoretical or experimental paper. 
This topic is the subject of much innocuous lunchroom 
conversation, and more seriously in the evaluation of the 



organizations and individuals responsible for producing 
the result. Even in the most quantitative subfields in the 
physical sciences, the discussions leading to these deci- 
sions and evaluations are notably non-quantitative. 

If direct technological applications are possible, the rel- 
evant figure of merit should be something like number of 
lives saved, tons of reduced carbon emissions, or mon- 
etary profit. This article does not address science with 
immediate technological implications. The figure of merit 
constructed in this article is designed to assess theoret- 
ical developments whose technological implications are 
sufficiently remote to be highly uncertain, leaving their 
primary short term benefit to be scientific rather than 
technological. 

To make this article self-contained. Section II briefiy 
reviews the quantitative measure of experimental scien- 
tific merit previously proposed in Ref. [1]. Section III 
presents an interpretation in which the measure proposed 
in Ref. []] can also be used to quantify theoretical scien- 
tific merit. Section IV provides examples showing how 
this figure of merit can be applied. Section V discusses 
potential advantages of adopting this figure of merit in 
practice. Section VI summarizes. 



II. REVIEW OF EXPERIMENTAL SCIENTIFIC 
MERIT 



The essential idea of Ref. [1] is that the value of a par- 
ticular experimental result in an academic field should 
be measured by how much is learned from the result. 
Equivalently, the value of a result is how surprised you 
are that the particular result has been obtained. An ex- 
periment confirming an effect already predicted with high 
confidence does not teach us much, while an experiment 
producing an unanticipated result can teach us a great 
deal. Section II of Ref. [f] develops this basic idea into 
a quantitative measure of experimental scientific merit, 
borrowing elementary concepts from information theory. 

Adopting the notation of Ref. [1], let Y = 
{yi, . . . , . . . , ym} denote a set of qualitatively distinct 
and mutually incompatible states of knowledge, the j*^ 
of which is generally accepted to be correct with prob- 
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ability pijjj), and let X = {xi, . . . ,Xi, . . . ,Xn} denote 
possible outcomes of an experiment, the i**^ of which is 
expected to be realized with probability p(xi) [4]. Since 
the process of normal science relies to a significant de- 
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gree on a scientific community's shared view of impor- 
tant problems and possible solutions, in practice there is 
little difficulty defining X and Y [1]. The sets X and 
Y are assumed to be complete and their individual el- 
ements orthogonal, so that p{xi^,Xi^) = piyjnVjz) = 

and Pi^r ) = PiVj ) = 1 [-j] ■ ^ 

The evidence the result Xi provides in favor of the state 
of knowledge yj is 

evidence(yj|a;i) = log^o ^^^^1^. (f) 

PyVj) 

Since in practice the state of knowledge yj is not known 
to be correct, the scientific merit of obtaining the exper- 
imental result Xi is the evidence the result Xi provides 
in favor of yj, averaged over possibly correct states of 
knowledge yj, weighted according to the current expec- 
tation that each yj is correct. The scientific merit of the 
experimental result Xi is thus the information gain, where 

information gain (xi) = |now) evidence(j/j jxi) 

j 

= J2p{y,\now) log,o^^,(2) 

denoting by p(yj|now) the current expectation that yj 
is correct. Immediately after the result Xi is obtained, 
p{yj\now) = p{yj\xi). The scientific merit of an experi- 
ment not yet performed is the expected value of the sci- 
entific merit of its potential results, 

AH = ^~^p(.Tj) information gain (xj) 

i 

- logio— (3) 

where AH denotes the expected decrease in information 
entropy associated with the states of knowledge Y upon 
performing the experiment X. 

Further derivation and discussion are provided in Sec- 
tion II of Ref. [I]. 

III. THEORETICAL SCIENTIFIC MERIT 

The experimental figure of merit assumes a complete 
set of possible states of knowledge Y. An experimental 
outcome Xi adjusts the belief p{yj) that each state of 
knowledge yj is correct. 

Similarly, the theoretical figure of merit proposed in 
this article assumes a complete set of possible states of 
knowledge Y. A theoretical result Xi adjusts the belief 
p{yj) that each state of knowledge yj is correct. The 
theoretical figure of merit proposed in this article is thus 
the same as the experimental figure of merit proposed in 
Ref. [1], with appropriately interpreted Xi. 

It is worth emphasizing that in the proposed figure of 
merit, a theoretical paper does not obtain its value by 



extending Y , which by assumption is a complete set. A 
theoretical paper obtains its value by articulating reasons 
for adjusting beliefs that some subset of states of knowl- 
edge are correct, based on simplicity or agreement with 
existing data. 

Any serious proposal for quantifying theoretical sci- 
entific merit must satisfy a basic property of self con- 
sistency: the sum of the merits of two separate papers 
containing a body of experimental and/or theoretical re- 
sults must equal the merit of a single paper containing 
the same body of results. Current popular proxies for sci- 
entific merit, such as number of publications and number 
of citations, violate this basic property of self consistency. 
The figure of merit proposed in this article uniquely satis- 
fies this desired additive property, which is used as the ba- 
sis of the derivation of the experimental figure of merit in 
Ref. [1]. This additive property allows the figure of merit 
to be divided by cost to obtain a well-defined "bang per 
buck," quantifying scientific value per dollar of funding. 
The bang per buck can in turn be used in the research 
portfolio allocation problem faced at all levels in the sci- 
entific enterprise. 

With the value of both experimental and theoretical re- 
sults arising from how they change beliefs on the set Y, 
the figure of merit proposed in this article allows a quan- 
titative comparison of the scientific merit of experimental 
results with the scientific merit of theoretical results. 



IV. EXAMPLES 

Three short examples will serve to clarify the proposed 
figure of merit. 

Let SM denote the particle physics Standard Model, 
believed to be correct with probability p(SM) = 1/2. A 
new theoretical paper x articulates a model Ti, pointing 
out consistency with existing experimental data and not- 
ing certain elegant features. The model Ti, comes out of 
left field; no other person has published along remotely 
similar lines. The prior expectation p{H) that H is the 
correct model, corresponding to your belief if someone 
were to describe the outline of the idea to you on the 
street without the supporting justification provided by 
the paper, is taken to be p{Ti.) ~ 10"^". The somewhat 
arbitrary choice of p{Ti.) = 10"^", which corresponds 
roughly to giving every human being's pet theory equal 
weight, will not greatly affect the result. The sum of the 
beliefs of all other models in F is 1 — p(SM) —p{H). 

After the paper is absorbed by the field, the model Ti 
is beheved to be correct with probabihty p(7i|.T) = e, the 
Standard Model is believed to be correct with probability 
p(SM|a;) = 1/2 - (e - 10-^°), and the sum of the beliefs 
of all other models in Y is unchanged. Assuming 1 ^ 
e 3> 10"^*^, the information gain resulting from the paper 
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information gain resulting from the paper x is 



information gain(a;) 



P(SM|.)log,oLL2 

P(W|x)fogio^ 

((l/2-e)(-26fogi„e)- 
e(10 + logioe) 
e(9.6 + fogioe), 



information gain(a;) 



p(SM|x) fogio ^\JJ + 



(4) 



where terms of order and 10~^° have been dropped [6]. 
If the new model Ti is believed to be correct with proba- 
bility p(TL\x) = e ?» 10^*, then the scientific merit of the 
paper calculated from Eq. 4 is roughly 6e-4 [7]. 

In this example, the scale of the scientific merit of the 
paper x articulating the novel hypothesis Ti is set by the 
posterior belief p{T-L\x) = e that H, is actually correct. A 
multiplicative constant of order unity (equal to roughly 
6 in this case) credits novelty. The theoretical scientific 
merit of many proposed models in particle physics can 
be calculated in this manner, using appropriate values of 



The proposal that the scientific merit of a paper x ar- 
ticulating a new hypothesis H should be closely related 
to the posterior belief p{T-l\x) that TL is actually correct 
may seem strange to fields accustomed to using number 
of citations as a proxy for scientific worth. The pro- 
posed measure should however align with intuition af- 
ter further reflection. Most scientists agree that simple 
(elegant) theories are generally better than complicated 
(ugly) theories, and that theories in agreement with exist- 
ing data arc generally better than those in disagreement 
with existing data. Through Bayes' theorem, simplicity 
and fidelity are exactly the quantities that determine the 
posterior belief. Alternatively, one can note that in the 
end the whole point is getting the right answer. Any rea- 
sonable figure of merit must therefore incorporate, prefer- 
ably in as direct a manner as possible, the current belief 
that the proposed hypothesis H is in fact the correct de- 
scription of nature. 

Consider as a second example a paper x describing an 
improvement to calculation within the Standard Model 
that results in closer agreement with existing experimen- 
tal results. Such a paper x may increase the poste- 
rior belief in the Standard Model from p(SM) = 1/2 to 
p(SM|a;) = 1/2 -|- e by removing a discrepancy between 
Standard Model prediction and data. This increased pos- 
terior belief in the Standard Model comes at the expense 
of alternative models H. that had received attention in 
part due to this discrepancy. The paper x has resulted 
in the reduction of the summed belief of this set of alter- 
native models from p{'H) = e to p{TL\x) = e' <C e. The 



p{n\x)\og^ 



p(SM) 
p{'H\x) 



((l/2 + e)(2elogioe) 

'1 ^' 
e logio - 



elog 



10 ^' 



(5) 



where terms of order e' and have been dropped. If 
the Standard Model after the paper x is believed to be 
correct with probability p(SM|x) = 1/2 -|- e = 51%, then 
the scientific merit of the paper calculated from Eq. 5 is 
roughly 4e-3. 

In this example, the scale of the scientific merit of 
the paper x articulating an improvement in calculating 
within the Standard Model is set by the increased confi- 
dence p{SM\x) - p(SM) = e that the Standard Model is 
actually correct. The theoretical scientific merit of many 
calculational improvements within the particle physics 
Standard Model can be determined in this way, using 
appropriate values of e. 

As a historical example, consider two crucial occur- 
rences in the development of general relativity: Einstein's 
theoretical developments up to 1917, and the 1919 expe- 
dition led by Eddington that confirmed general relativ- 
ity's prediction for the deflection of starlight by the sun. 
Expectations in this example are inferred from the his- 
torical recounting of Ref. [3]; conclusions drawn should 
be checked for robustness under reasonable variations in 
these expectations. At the end of 1905, after the intro- 
duction of special relativity but before the series of papers 
culminating in the Einstein field equations, a person ap- 
proaching you on the street and outlining the hypothesis 
that would become general relativity would have been 
believed with an expectation of p(GR|1905) w 10"^". 
Newtonian gravity was expected to be correct with ex- 
pectation of p(Newton|1905) ~ 99%, and with proba- 
bility p(othcr|1905) k, 1% — 10~^° any of a number of 
other possibilities might have been correct. After Ein- 
stein's papers, the scientific community remained skepti- 
cal that general relativity was indeed a correct descrip- 
tion of nature. The scientific community's expectation 
in 1917 that general relativity would prove to be correct 
was p(GR|1917) « 2%, leaving p(Newton|1917) « 97% 
and p(other| 1917) w 1%. Eddington's 1919 measurement 
of the bending of starlight around the sun at angles of 
1'.'98 ± 0'.'30 at Sobral and Crommclin's measurement of 
1'.'61 ± 0'.'30 at Principe during the same eclipse were 
found to be in significantly better agreement with gen- 
eral relativity's prediction of 1'.'74 than the Newtonian 
prediction of 0'.'87. In the eyes of the scientific commu- 
nity this experimental measurement significantly raised 
the expectation that general relativity is the correct de- 
scription of nature, resulting in p(GR|1919) w 90%, 
p(Newton|1919) w 9%, and p(othcr|1919) « 1% [s]. 
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Giving Einstein full credit for the changes in be- 
liefs between 1905 and 1917, the theoretical scientific 
merit of Einstein's general relativity papers in 1917 was 
information gain(Einstein, 1917) = 



p(GR|1917) log 



p(GR|1917) 
10 p(GR|1905) 



+ 



p(Newton|1917)log,o g-:°::[}^g - 
p(othcr|1917) logj^ 



p(othcr| 


1917) 


p(othcr 


1905) 



0.2. 



Giving Eddington full credit for the changes in be- 
liefs between 1917 and 1919, the experimental scien- 
tific merit of Eddington's measurement in 1919 was 
information gain(Eddington, 1919) = 



p(GR|1919)log 



p(GR|1919) 
10 p(GR|1917) 



+ 



p(Newton|1919)log,ogg2^|«- 
Mother|1919)log,ofgSw 



1.4. 



Upon Eddington's measurement in 1919, the theoreti- 
cal scientific merit of Einstein's contribution increased to 
information gain(Einstcin, 1919) = 



p(GR| 1919) log 



p(GR|1917) 
10 p(GR|1905) ' 



p(Newton|1919)log,og^2^|igig- 
p(other|1919)log,o£g£^ 



7.5. 



The total scientific merit of Eddington and Einstein in 
1919 was information gain(Eddington, Einstein, 1919) = 

p(GR|1919)logiogiigi} + 



p(Newton|1919)log,og^2^i«- 
p(other|1919)log,o 1^^1919) 



8.9, 



which is seen to be equal to the sum of Eddington's and 
Einstein's individual scientific merits in 1919, as required 
for self consistency. 

This historical example raises several interesting 
points. Increasing the belief of a new theory from ex- 
tremely unlikely (p(GR|1905) ~ 10"^") to very likely 
(p(GR|1919) of order unity) is worth 10 points [9], dis- 
tributed according to the evidence, either theoretical or 
experimental, provided by each contributor. The scien- 
tific merit of Einstein's theoretical work increased sub- 
stantially from 1917 to 1919 due to Eddington's evidence 
supporting general relativity's prediction for the bending 
of starlight by the sun, even though Einstein himself ar- 
guably did not do much from 1917 to 1919. The increase 
in the scientific merit of Einstein's work after Eddington's 
result is consistent with Einstein's worldwide fame for 
general relativity coming after, rather than prior to, Ed- 
dington's measurement. Interestingly, the scientific merit 
of Eddington's measurement is within an order of magni- 
tude of the scientific merit of Einstein's theoretical devel- 
opment, suggesting that Eddington deserves significantly 



more credit than typically given in popular recountings 
of the history of general relativity. Conclusions such as 
these are robust under reasonable variations in estimated 
expectations. 

Ref. [j] provides a number of additional examples, us- 
ing the same formalism to quantify the scientific merits 
of specific completed and proposed particle physics ex- 
periments. 



V. DISCUSSION 

The figure of merit proposed in this article is suffi- 
ciently different from common practice in many fields 
that a brief discussion of salient features may be help- 
ful. 

The figure of merit proposed in this article is well be- 
haved. It is appropriately additive, in the sense that the 
scientific merit of a single article containing two separate 
results is equal to the sum of the scientific merits of two 
different articles describing the results separately. Popu- 
lar alternative proxies for merit, such as number of pub- 
lications or number of citations, violate this basic prop- 
erty of self consistency. This additive property allows the 
proposed measure of theoretical merit to meaningfully be 
divided by cost to obtain a measure of scientific bang per 
buck that can be used directly to optimize a scientific 
portfolio. 

The proposed measure of theoretical scientific merit is 
manifestly consistent with the measure of experimental 
scientific merit previously proposed in Ref. [1]. This fig- 
ure of merit thus constitutes a quantitative measure of to- 
tal scientific merit, encompassing both experimental and 
theoretical work. The figure of merit provides a frame- 
work for quantitatively comparing the scientific merit of 
theoretical and experimental results, for quantitatively 
comparing the scientific merit of proposed theoretical re- 
search with proposed experiments, and for optimizing a 
scientific portfolio consisting of both experimental and 
theoretical research. As a specific example, in most sub- 
fields there has been little quantitative analysis into the 
question of whether an increase in theoretical funding 
relative to experimental funding (or vice versa) would 
increase expected information gain. The figure of merit 
proposed in this article provides a framework for such a 
quantitative analysis. 

Intelligent, conscious maximization of a particular fig- 
ure of merit is expected to result in a world line with 
higher values of that figure of merit than a world line in 
which decisions are made according to other, possibly less 
well defined, criteria. Intelligent, conscious maximization 
of expected information gain is therefore expected to re- 
sult in a world line with greater information gain than 
a world line in which decisions are made according to 
other, possibly less well defined, criteria. 

Use of information content or information gain to eval- 
uate the scientific merit of theoretical contributions re- 
quires the estimation of beliefs that proposed theories are 
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correct, and the reader may object that the problem of 
quantifying a theoretical paper's scientific merit has sim- 
ply been reformulated in terms of the estimation of the 
beliefs that theories are correct. At worst, this reformula- 
tion significantly changes and focuses the discussion. The 
fact that there is not a well-developed literature to point 
to for the justification of these beliefs emphasizes the fact 
that until now the importance of estimating these has 
not been properly recognized in assessing the scientific 
merit of theoretical contributions. In most cases, the sci- 
entific conclusion drawn from this reformulation will be 
robust against the variation of these beliefs within their 
justifiable range. Tables I and II in Ref. [f] suggest that 
experimental scientific merit per incremental unit cost 
ranges over several orders of magnitude. A similar range 
in theoretical scientific merit per incremental unit cost is 
expected. 

Some readers may object to the very idea of construct- 
ing an explicit figure of merit to quantify the scientific 
merit of theoretical contributions. These readers should 
bear in mind that this already is done (implicitly, if not 
explicitly) every time a decision of resource allocation or 
promotion is made. In discussions of funding for pro- 
posed research directions and recognition of completed 
research directions, value judgments are made regarding 
the theoretical scientific merit of proposals and results. 
Such value judgments are a necessary part of the scien- 
tific process. The question is therefore not whether the 
theoretical scientific merit of proposals and results should 
be determined, but rather how best to determine it. It 
is surely in each field's interest for such evaluations to be 
made in the sharpest, most open, most quantifiable, and 
scientifically best motivated framework possible. 

Some readers may object to the specific figure of merit 
advocated in this article. Scientists whose theoreti- 
cal research scores higher under more traditional mea- 
sures than under the measure proposed here may be ex- 
pected to be among those voicing the strongest objec- 
tions. These readers are challenged to find a scientifically 
better motivated figure of merit. 



appropriate for quantifying theoretical scientific merit. 

The choice of a reasonable quantitative figure of merit 
for assessing the scientific merit of proposed theoreti- 
cal research programs can inform and focus program re- 
view and accompanying decisions of resource allocation 
in many subfields of the physical sciences. The related 
choice of a reasonable figure of merit for assessing the 
scientific merit of any particular theoretical contribution 
can inform the evaluation of those organizations and in- 
dividuals responsible for its production. 

This article advocates that the scientific merit of a 
completed theoretical contribution should be quantified 
by the extent to which it changes beliefs in the correct- 
ness of competing candidate theories. Change in belief 
of the correctness of competing candidate theories is an 
elementary notion in the context of information theory, 
quantified by information gain, defined in Eq. 2. 

This article advocates that the amount of information 
a program of research is expected to provide is the ap- 
propriate quantity for assessing the scientific merit of any 
proposed theoretical research direction. Expected infor- 
mation gain is a well understood concept in information 
theory, quantified by a change in information entropy 
AH, defined in Eq. 3. 

The measure of theoretical scientific merit advocated 
here, although developed with particle physics and cos- 
mology foremost in mind, is expected to apply equally 
to other physical sciences in which results may be far re- 
moved from practical technological application. This fig- 
ure of merit may provide a useful quantitative framework 
within which decisions about future resource allocation 
can be made. 
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mental scientific merit developed in Ref. [1] in a manner 
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